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Abstract 

I present a datatype-generic treatment of recursive container types 
whose elements are guaranteed to be stored in increasing order, 
with the ordering invariant rolled out systematically. Intervals, lists 
and binary search trees are instances of the generic treatment. On 
the journey to this treatment, I report a variety of failed experiments 
and the transferable learning experiences they triggered. I demon¬ 
strate that a total element ordering is enough to deliver insertion and 
flattening algorithms, and show that (with care about the formula¬ 
tion of the types) the implementations remain as usual. Agda’s in¬ 
stance arguments and pattern synonyms maximize the proof search 
done by the typechecker and minimize the appearance of proofs 
in program text, often eradicating them entirely. Generalizing to 
indexed recursive container types, invariants such as size and bal¬ 
ance can be expressed in addition to ordering. By way of example, I 
implement insertion and deletion for 2-3 trees, ensuring both order 
and balance by the discipline of type checking. 

Categories and Subject Descriptors D.1.1 [Programming Tech¬ 
niques ]: Applicative (Functional) Programming; D.3.3 [Language 
Constructs and Features ]: Data types and structures 
Keywords dependent types; Agda; ordering; balancing; sorting 

1. Introduction 

It has taken years to see what was under my nose. I have been 
experimenting with ordered container structures for a long time 
[12]: how to keep lists ordered, how to keep binary search trees 
ordered, how to flatten the latter to the former. Recently, the pattern 
common to the structures and methods I had often found effective 
became clear to me. Let me tell you about it. Patterns are, of course, 
underarticulated abstractions. Correspondingly, let us construct a 
universe of container-like datatypes ensuring that elements are in 
increasing order, good for intervals, ordered lists, binary search 
trees, and more besides. 

This paper is a literate Agda development, available online 
at https://github.com/pigworker/Pivotal. As well as con¬ 
tributing 

• a datatype-generic treatment of ordering invariants and opera¬ 
tions which respect them 

• a technique for hiding proofs from program texts 


Permission to make digital or hard copies of all or part of this work for personal or 
classroom use is granted without fee provided that copies are not made or distributed 
for profit or commercial advantage and that copies bear this notice and the full citation 
on die first page. Copyrights for components of this work owned by others than ACM 
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, 
to post on servers or to redistribute to lists, requires prior specific permission and/or a 
fee. Request permissions from permissions@acm.org. 

ICFP ’14, September 1-6, 2014, Gothenburg, Sweden. 

Copyright © 2014 ACM 978-1-4503-2873-9/14/09... $15.00. 
http://dx.doi.org/10.1145/2628136.2628163 


• a precise implementation of insertion and deletion for 2-3 trees 
I take the time to explore the design space, reporting a selection of 
the wrong turnings and false dawns I encountered on my journey to 
these results. I try to extrapolate transferable design principles, so 
that others in future may suffer less than I. 

2. How to Hide the Truth 

If we intend to enforce invariants, we shall need to mix a little 
bit of logic in with our types and a little bit of proof in with our 
programming. It is worth taking some trouble to set up our logical 
apparatus to maximize the effort we can get from the computer 
and to minimize the textual cost of proofs. We should prefer to 
encounter logic only when it is dangerously absent! 

Our basic tools are the types representing falsity and truth by 
virtue of their number of inhabitants: 

data 0 : Set where — no constructors! 

record 1 : Set where constructor () — no fields! 

Dependent types allow us to compute sets from data. E.g., we 
can represent evidence for the truth of some Boolean expression 
which we might have tested, 
data 2 : Set where tt fF : 2 
So : 2 —>■ Set 
So tt = 1 
SofF = 0 

A set P which evaluates to 0 or to 1 might be considered 
‘propositional’ in that we are unlikely to want to distinguish its 
inhabitants. We might even prefer not even to see its inhabitants. 
I define a wrapper type for propositions whose purpose is to hide 
proofs. 

record r _ n ( P : Set) : Set where 
constructor ! 
field {{prf}} : P 

Agda uses braces to indicate that an argument or field is to be 
suppressed by default in program texts and inferred somehow by 
the typechecker. Single-braced variables are solved by unification, 
in the tradition of Milner [16]. Doubled braces indicate instance 
arguments, inferred by contextual search : if just one hypothesis 
can take the place of an instance argument, it is silently filled in, 
allowing us a tiny bit of proof automation [6]. If an inhabitant of 
r So IP is required, we may write ! to indicate that we expect the 
truth of b to be known. 

Careful positioning of instance arguments seeds the context 
with useful information. We may hypothesize over them quietly, 

: Set -4 Set -4 Set 
P=» T «r {{p : P}} -4 T 

infixr 3 _=>_ 

and support forward reasoning with a ‘therefore’ operator. 
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: V{P T} -4 r P n -4 (P => T) -4 T 
l.-.t = t 

This apparatus can give the traditional conditional a subtly more 
informative type, thus: 

—. : 2 —2; —> It = fF; —> fF = tt 
if_then_else_ : 

W{X} b -4 (So b =S> X) -4 (So {-<b)=>X) -> X 
if tt then t else f = t 
if ff then t else f = f 
infix 1 if _then_else_ 


From checking the invariant to enforcing it. Assuming we can 
test the order on P with some le : P —t P —> 2, we could write 
a recursive function to check whether a Tree is a valid search tree 
and compute its range if it has one. Of course, we must account for 
the possibility of invalidity, so let us admit failure in the customary 
manner. 

data Maybe ( X : Set) : Set where 
yes : X — > Maybe X; no : Maybe A 
_?)_ : V{X} -4 2-4 Maybe X -4 Maybe X 
6 ?) mx = if b then mx else no 
infixr 4 -?)- 


If ever there is a proof of 0 in the context, we should be able to 
ask for anything we want. Let us define 


The guarding operator ?) allows us to attach a Boolean test. We 
may now validate the range of a Tree. 


magic : {X : Set} — 4 0 => X 

m ag'c {{()}} 

using Agda’s absurd pattern to mark the impossible instance 
argument which shows that no value need be returned. E.g., 
if tt then ff else magic : 2. 

Instance arguments are not a perfect fit for proof search: they 
were intended as a cheap alternative to type classes, hence the 
requirement for exactly one candidate instance. For proofs we 
might prefer to be less fussy about redundancy, but we shall manage 
perfectly well for the purposes of this paper. 

3. Barking Up the Wrong Search Trees 

David Turner [17] notes that whilst quicksort is often cited as a 
program which defies structural recursion, it performs the same 
sorting algorithm (although not with the same memory usage pat¬ 
tern) as building a binary search tree and then flattening it. The 
irony is completed by noting that the latter sorting algorithm is the 
archetype of structural recursion in Rod Burstall’s development of 
the concept [4]. Binary search trees have empty leaves and nodes 
labelled with elements which act like pivots in quicksort: the left 
subtree stores elements which precede the pivot, the right subtree 
elements which follow it. Surely this invariant is crying out to be a 
dependent type! Let us search for a type for search trees. 

We could, of course, define binary search trees as ordinary node- 
labelled trees with parameter P giving the type of pivots: 

data Tree : Set where 

leaf : Tree; node : Tree -4 P —4 Tree -4 Tree 

We might then define the invariant as a predicate IsBST : Tree -4 
Set, implement insertion in our usual way, and prove separately 
that our program maintains the invariant. However, the joy of de- 
pendently typed programming is that refining the types of the data 
themselves can often alleviate or obviate the burden of proof. Let 
us try to bake the invariant in. 

What should the type of a subtree tell us? If we want to check 
the invariant at a given node, we shall need some information about 
the subtrees which we might expect comes from their type. We 
require that the elements left of the pivot precede it, so we could 
require the whole set of those elements represented somehow, but 
of course, for any order worthy of the name, it suffices to check only 
the largest. Similarly, we shall need to know the smallest element 
of the right subtree. It would seem that we need the type of a search 
tree to tell us its extreme elements (or that it is empty). 

data STRange : Set where 

0 : STRange; _ : P — V P — 4 STRange 


/alid : Tree -4 Maybe STRange 
/alid leaf = yes 0 

/alid (node l p r) with valid l | valid r 
.. | yes0 | yes 0 = yes (p-p) 

.. | yes 0 | yes (c—d) = le pc'!) yes 

.. | yes ( a—b ) | yes 0 = le b p ?) yes 

.. j yes (a— b) j yes (c-d) 

= le b pi) le p cl) yes ( a-d ) 


{p-d) 

\a-p) 


As valid is a fold over the structure of Tree, we can follow my 
colleagues Bob Atkey, Neil Ghani and Patricia lohann in comput¬ 
ing the partial refinement [2] of Tree which valid induces. We seek 
a type BST : STRange —4 Set such that BST r = {t : Tree | 
valid t = yes r} and we find it by refining the type of each con¬ 
structor of Tree with the check performed by the corresponding 
case of valid, assuming that the subtrees yielded valid ranges. We 
can calculate the conditions to check and the means to compute the 
output range if successful. 

IOK : STRange -4 P -4 2 
IOK 0 p = tt 
IOK (--u)p = leu p 
rOK : P -4 STRange -4 2 
rOK p 0 = tt 

rOK p (l-_) ^ le pi 

outRan : STRange —4 P —4 STRange —4 STRange 
outRan 0 p0 = p — p 

outRan 0 p (_— u) = p — u 

outRan (1 — 4p0 = l—p 

outRan (l — -) _(_—«) m l — u 
We thus obtain the following refinement from Tree to BST : 
data BST : STRange — 4 Set where 
leaf : BST 0 

node : V{ Z r} -4 BST l -4 (p : P) -4 BST r -4 
So (IOK 1 p) =4 So (rOK pr)4 BST (outRan l p r) 


Attempting to implement insertion. Now that each binary search 
tree tells us its type, can we implement insertion? Rod Burstall’s 
implementation is as follows 

insert : P — V Tree -4 Tree 

insert y leaf = node leaf y leaf 

insert y (node It p rt) = 

if le y p then node (insert y It) p rt 
else node It p (insert y rt) 

but we shall have to try a little harder to give a type to insert, as 
we must somehow negotiate the ranges. If we are inserting a new 
extremum, then the range will be wider afterwards than before. 
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insRan : STRange — > P — > STRange 
insRan 0 y = y — y 
insRan ( l — u)y = 

if le y l then y — u else if le u y then l — y else 1 — u 
So, we have the right type for our data and for our program. 
Surely the implementation will go like clockwork! 

insert : V{r} y -4 BST r — I BST (insRan r y ) 
insert y leaf = node leaf y leaf 

insert y (node It p rt) = 
if ley p then 
else 


The leaf case checks easily, but alas for node! We have It : 
BST l and rt : BST r for some ranges l and r. The then 
branch delivers a BST (outRan (insRan l y) p r), but the 
type required is BST (insRan (outRan l p r) y), so we need 
some theorem-proving to fix the types, let alone to discharge the 
obligation So (IOK (insRan l y) p). We could plough on with 
proof and, coughing, push this definition through, but tough work 
ought to make us ponder if we might have thought askew. 

We have defined a datatype which is logically correct but which 
is pragmatically disastrous. Is it thus inevitable that all datatype 
definitions which enforce the ordering invariant will be pragmat¬ 
ically disastrous? Or are there lessons we can learn about depen- 
dently typed programming that will help us to do better? 

4. Why Measure When You Can Require? 

Last section, we got the wrong answer because we asked the wrong 
question: “What should the type of a subtree tell us?” somewhat 
presupposes that information bubbles outward from subtrees to the 
nodes which contain them. In Milner’s tradition, we are used to 
synthesizing the type of a thing. Moreover, the very syntax of data 
declarations treats the index delivered from each constructor as an 
output. It seems natural to treat datatype indices as measures of 
the data. That is all very well for the length of a vector, but when 
the measurement is intricate, as when computing a search tree’s 
extrema, programming becomes vexed by the need for theorems 
about the measuring functions. The presence of ‘green slime’— 
defined functions in the return types of constructors—is a danger 
sign. 

We can take an alternative view of types, not as synthesized 
measurements of data, bubbled outward, but as checked require¬ 
ments of data, pushed inward. To enforce the invariant, let us rather 
ask “What should we tell the type of a subtree?”. 

The elements of the left subtree must precede the pivot in the 
order; those of the right must follow it. Correspondingly, our re¬ 
quirements on a subtree amount to an interval in which its elements 
must fall. As any element can find a place somewhere in a search 
tree, we shall need to consider unbounded intervals also. We can 
extend any type with top and bottom elements as follows. 

data _I ( P : Set) : Set where 
T : P±;#: P -4 Pl;-L : Pi 
and extend the order accordingly: 

-I s V{P} -4 (P -4 P -4 2) -4 Pi -4 Pi -4 2 

lei - T = tt 

lei ( #x ) ( #y ) = lex y 

lei ± - = tt 

lei - - = ff 

We can now index search trees by a pair of loose bounds, not 
measuring the range of the contents exactly, but constraining it 


sufficiently. At each node, we can require that the pivot falls in the 
interval, then use the pivot to bound the subtrees. 

data BST (l u : Pj) : Set where 
leaf : BST l u 

pnode : (p : P) -4 BST l (#p) -4 BST (#p) u -4 
So (lei l (#p)) >+ So (lei (#p) «) ^ BST lu 
In doing so, we eliminate all the ‘green slime’ from the indices of 
the type. The leaf constructor now has many types, indicating all 
its elements satisfy any requirements. We also gain BST 1 T as 
the general type of binary search trees for P. Unfortunately, we 
have been forced to make the pivot value p, the first argument to 
pnode, as the type of the subtrees now depends on it. Luckily, 
Agda now supports pattern synonyms, allowing linear macros to 
abbreviate both patterns on the left and pattern-like expressions on 
the right [1], We may fix up the picture: 

pattern node Ip p pu = pnode p Ip pu 

Can we implement insert for this definition? We can certainly 
give it a rather cleaner type. When we insert a new element into the 
left subtree of a node, we must ensure that it precedes the pivot: that 
is, we expect insertion to preserve the bounds of the subtree, and 
we should already know that the new element falls within them. 

insert : V{ / u} y —r BST l u -4 
So (lei l (*y)) => So (lei ( #y ) u) ^ BST lu 

insert y leaf s= node leaf y leaf 

insert y (node It p rt) = 

if le y p then node (insert y It) p rt 
else node It p (|nsert y rt ) 

We have no need to repair type errors by theorem proving, and most 
of our proof obligations follow directly from our assumptions. The 
recursive call in the then branch requires a proof of So (le y p), 
but that is just the evidence delivered by our evidence-transmitting 
conditional. However, the else case snatches defeat from the jaws 
of victory: the recursive call needs a proof of So (le p y), but all 
we have is a proof of So (-< (le y p)). For any given total ordering, 
we should be able to fix this mismatch up by proving a theorem, but 
this is still more work than I enjoy. The trouble is that we couched 
our definition in terms of the truth of bits computed in a particular 
way, rather than the ordering relation. Let us now tidy up this detail. 

5. One Way Or The Other 

We can recast our definition in terms of relations—families of sets 
Rel P indexed by pairs. 

Rel : Set -4 Seti 

Rel P = P x P -4 Set 

giving us types which directly make statements about elements of 
P, rather than about bits. 

I must, of course, say how such pairs are defined: the habit of 
dependently typed programmers is to obtain them as the degenerate 
case of dependent pairs: let us have them. 

record I (5 : Set) (T : S —> Set) : Set where 
constructor 
field 
7ti : S 
7t2 : T 7Il 

open I 

_X_ : Set — > Set — > Set 

S x T = LS A_ -4 T 

infixr 5 4<_ 


(node (insert y It) p rt) 
(node It p (insert y rt)) 
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Now, suppose we have some ‘less or equal’ ordering L : Rel P. 
Let us have natural numbers by way of example, 
data N : Set where 0 : N; s : N -4 N 
L N : ReIN 

Ln [x,y) = x < y where 
_<_ : N -4 N -4 Set 
0 < y =1 
si<0 =0 
sx <sy = x<y 

The information we shall need is exactly the totality of L: for 
any given x and y, L must hold One Way Or The Other, as captured 
by the disjoint sum type, OWOTO L {x, y), defined as follows: 
data _+_ ( S T : Set) : Set where 
< : S -t S + T; t> : T —y S + T 
infixr 4 -+- 

OWOTO : V{P} (L : Rel P ) -4 Rel P 
OWOTO L (x, y) =» r l (x, yp + r L(y, xp 

pattern le = < ! 
pattern ge = > ! 

I have used pattern synonyms to restore the impression that we 
are just working with a Boolean type, but the ! serves to unpack 
evidence when we test and to pack it when we inform. We shall 
usually be able to keep silent about ordering evidence, even from 
the point of its introduction. For N, let us have 
owoto : Vs y — > OWOTO L N (x. y) 
owoto 0 y = le 
owoto (s x) 0 = ge 

owoto (s x) (s y) = owoto x y 
Note that we speak only of the crucial bit of information. More¬ 
over, we especially benefit from type-level computation in the 
step case: OWOTO L N (s x , s y) is the very same type as 
OWOTO L n (x,y). 

Any ordering relation on elements lifts readily to bounds: I have 
overloaded the notation for lifting in the typesetting of this paper, 
but sadly not in the Agda source code. Let us take the opportunity 
to add propositional wrapping, to help us hide ordering proofs. 

_I D : V{P} -4 Rel P -4 Rel Pj_ 

Ll(- ,T) = 1 

L I (#x,#y) * L (x,y) 

Li =1 

Li (- ,-) = 0 

TP xy = r iX xy 1 

The type r L~' (x , y) thus represents ordering evidence on bounds 
with matching and construction by !, unaccompanied. 

6. Equipment for Relations and Other Families 

Before we get back to work in earnest, let us build a few tools 
for working with relations and other such indexed type families: a 
relation is a family which happens to be indexed by a pair. We shall 
have need of pointwise truth and falsity. 

6 i : {/ : Set} -4 J -4 Set 
6 i = 0 
i i = 1 

We shall also need to lift disjunction, conjunction and implica¬ 
tion to to their pointwise counterparts. 

_+_J<---4- : {/ : Set} -4 

(/ -4 Set) -4 (J -4 Set) — y I — > Set 


(S+T) i = S i + T i 
{S x T) i = S i x T i 
(S T) i = S i -4 T i 

infixr 3 infixr 4 infixr 2 _-4_ 

Pointwise implication will be useful for writing index-respecting 
functions, e.g., bounds-preserving operations. It is useful to be able 
to state that something holds at every index (i.e., ‘always works’). 
[_]:{/: Set} ->(/->• Set) -4 Set 
[ F] = V {i} -4 Fi 

With this apparatus, we can quite often talk about indexed things 
without mentioning the indices, resulting in code which almost 
looks like its simply typed counterpart. You can check that for any 
S and T, < : [S -4 S + T] and so forth. 

7. Working with Bounded Sets 

It will be useful to consider sets indexed by bounds in the same 
framework as relations on bounds: propositions-as-types means we 
have been doing this from the start! Useful combinator on such 
sets is the pivoted pair, SAT, indicating that some pivot value 
p exists, with S holding before p and T afterwards. A pattern 
synonym arranges the order neatly. 

A. : V{P} -4 Rel -Pj -a Rel Pj_ -4 Rel Pi 
JV{P}S T(l,u) = LPXp -4 S{l,#p) x T(#p,u) 
pattern s p t = p,s,t 

infixr 5 

Immediately, we can define an interval to be the type of an 
element proven to lie within given bounds. 

_* : V{P} (L : Rel P) -4- Rel P\ 

IT = PAP 

pattern _° p = !,pj 

With habitual tidiness, a pattern synonym conceals the evidence. 
Let us then parametrize over some 

owoto : Wx y —y OWOTO L (x,y) 
and reorganise our development. 

data BST (lu : P±xPj_) : Set where 
leaf : BST lu 

pnode : ((TP x BST) A (TP x BST) BST) lu 
pattern node It p rt = pnode (p, (!, It), (!, rt )) 

Reassuringly, the standard undergraduate error, arising from 
thinking about doing rather than being, is now ill typed, 
insert : [IT A BST -4 BST] 
insert y° leaf = node leaf y leaf 

insert y° (node It p rt) with owoto y p 
... | le = (insert y° It) 

... | ge = (insert y° rt) 

However, once we remember to restore the unchanged parts of 
the tree, we achieve victory, at last! 
insert : [IT A BST BST] 
insert y° leaf = node leaf y leaf 

insert y° (node It p rt) with owoto y p 
... | le = node (insert y° It) p rt 

... | ge = node It p (insert y° rt) 

The evidence generated by testing owoto y p is just what is 
needed to access the appropriate subtree. We have found a method 
which seems to work! But do not write home yet. 


300 





8. The Importance of Local Knowledge 

Our current representation of an ordered tree with n elements 
contains 2 n pieces of ordering evidence, which is n — 1 too many. 
We should need only n + 1 proofs, relating the lower bound to 
the least element, then comparing neighbours all the way along to 
the greatest element (one per element, so far) which must then fall 
below the upper bound (so, one more). As things stand, the pivot 
at the root is known to be greater than every element in the right 
spine of its left subtree and less than every element in the left spine 
of its right subtree. If the tree was built by iterated insertion, these 
comparisons will surely have happened, but that does not mean we 
should retain the information. 

Suppose, for example, that we want to rotate a tree, perhaps to 
keep it balanced, then we have a little local difficulty: 

rotR : [BST -A BST] 

rotR (node (node It m mt) p rt) 

= node) It | m (node mt p rt) 
rotR t = t 

Agda rejects the outer node of the rotated tree for lack of evidence. 
I expand the pattern synonyms to show what is missing. 

rotR : [BST -A BST] 
rotR (pnode 

(0 {{(p}}> Pnode ((! {{lm}},lt) t m t (\ {{mp}},mt))) 
.P.O {{*>«}}.**))) = pnode ((! {{Zm}}, 

(! {{fl# j}}, pnode ((! {{mp}}, mt).p t (! {{pu}},rt)))) 
rotR t = t 

We can discard the non-local ordering evidence Ip : Lj_ (l, #p), 
but now we need the non-local ?o : L± (#m,u) that we lack. Of 
course, we can prove this goal from mp and pu if L is transitive, 
but if we want to make less work, we should rather not demand 
non-local ordering evidence in the first place. 

Looking back at the type of node, note that the indices at which 
we demand ordering are the same as the indices at which we 
demand subtrees. If we strengthen the invariant on trees to ensure 
that there is a sequence of ordering steps from the lower to the upper 
bound, we could dispense with the sometimes non-local evidence 
stored in nodes, at the cost of a new constraint for leaf. 

data BST ( lu : P]_ x Pj_) : Set where 
pleaf : (T> -A BST) lu 
pnode : (BST A BST -A BST) lu 
pattern leaf = pleaf ! 

pattern node It p rt = pnode (Zf.p, rt) 

Indeed, a binary tree with n nodes will have n+ 1 leaves. An in- 
order traversal of a binary tree is a strict alternation, leaf-node-leaf- 
... -node-leaf, making a leaf the ideal place to keep the evidence 
that neighbouring nodes are in order! Insertion remains easy. 

insert : [L* -A BST -A BST] 
insert y° leaf = node leaf y leaf 
insert y° (node It p rt) with ouioto y p 
... | le = node (insert y° It) p rt 
... | ge = node It p (insert y° rt) 

Rotation becomes very easy: the above code now typechecks, 
with no leaves in sight, so no proofs to rearrange! 

rotR : [BST BST] 

rotR (node (node It m mt) p rt) 

= node It m (node mt p rt) 
rotR t = t 


We have arrived at a neat way to keep a search tree in order, 
storing pivot elements at nodes and proofs in leaves. Phew! 

But it is only the end of the beginning. To complete our sorting 
algorithm, we need to flatten binary search trees to ordered lists. 
Are we due another long story about the discovery of a good 
definition of the latter? Fortunately not! The key idea is that an 
ordered list is just a particularly badly balanced binary search 
tree, where every left subtree is a leaf. We can nail that down in 
short order, just by inlining leaf’s data in the left subtree of node, 
yielding a sensible cons. 

data OList (lu : P]_ x Pj_) : Set where 
nil : ( r L n -A OList) lu 
cons : ( IP A OList -A OList) lu 

These are exactly the ordered lists Sam Lindley and I defined in 
Haskell [11], but now we can see where the definition comes from. 

By figuring out how to build ordered binary search trees, we 
have actually discovered how to build quite a variety of in-order 
data structures. We simply need to show how the data are built from 
particular patterns of BST components. So, rather than flattening 
binary search trees, let us pursue a generic account of in-order 
datatypes, then flatten them all. 

9 . Jansson and Jeuring’s PolyP Universe 

If we want to see how to make the treatment of ordered container 
structures systematic, we shall need some datatype-generic account 
of recursive types with places for elements. A compelling starting 
point is the ‘PolyP’ system of Patrik Jansson and Johan Jeuring [8], 
which we can bottle as a universe—a system of codes for types—in 
Agda, as follows: 

data JJ : Set where 
‘R ‘P T : JJ 

_‘+__‘x_ : JJ JJ —> JJ 

infixr 4 -+- 
infixr 5 Jx_ 

The ‘R stands for ‘recursive substructure’ and the ‘P stands for 
‘parameter’—the type of elements stored in the container. Given 
meanings for these, we interpret a code in JJ as a set. 

[|_| jj : JJ — ► Set — » Set — > Set 
1‘RJjj RP = R 

[‘P]jj RP = P 

[‘1]jj RP = 1 

{S *+ TJjj RP = [5]jj RP+ I TJjj R P 

[S‘x T]jj RP = [S]jj R P x [T]jj R P 

When we ‘tie the knot’ in pjj F P, we replace F’s ‘Ps by some 
actual P and its ‘Rs by recursive uses of pjj F P. 

data pjj (F : JJ) ( P : Set) : Set where 
0 : mjj (PJJ F P) P — >■ PJJ F P 
Being finitary and first-order, all of the containers encoded by 
JJ are traversable in the sense defined by Ross Paterson and my¬ 
self [14], We shall need to introduce the interface for Applicative 
functors 

record Applicative ( H : Set — > Set) : Seti where 
field 

pure : V{X} ->■ X -4 H X 

ap : V{5 T} H (S T) H S H T 

open Applicative 

and then abstract over Applicative to compute the datatype generic 
treatment of traverse. 
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traverse : \/{H F A B} —»• Applicative H — > 

(A —*■ H B) —*■ \ijj F A -y H (t-ijj F B) 
traverse {-ff}{F}{A}{.B} AH h t = go ‘R t where 
pu = pur eAH-,M- = ap AH 
go : VG — t 

[G]jj (mjj FA)A^H ([G]jj (pjj: F B) B) 
go ‘R (t) = pu 0 ® go F t 

go ‘P a = h a 

go ‘1 () = pu () 

go ( S ‘+ T) (< s) = pu < © go S s 

go (S *+ T) (> t) = pu > © go T t 

go (S 1 x T) ( s,t ) m (pu © go S s) © go T t 
We can specialise traverse to standard functorial map by choos¬ 
ing the identity functor. 

idApp : Applicative (A X —> X ) 
idApp = record (pure = id;ap = id} 
map : \/{F A B} —y 
{A -» B) -+ pjj F A -> pjj F B 
map = traverse idApp 

We can equally well specialise traverse to a monoidal crush by 
choosing a constant functor. 

record Monoid ( X : Set) : Set where 
field 

neutral : X 

combine : X -4 X -4 X 
monApp : Applicative (A _ —>• X) 
monApp = record 

{pure = A _ —> neutral;ap = combine} 
crush : \/{P F} -4 (P —r X) —t\ijjFP—tX 
crush = traverse {B — 0} monApp 
open Monoid 

Perversely, the fact that the constant functor discards the return 
value type, B in traverse’s type signature, results in the absence 
of constraints on B in the definition of crush, and hence the need 
to give B explicitly. I choose 0 merely to emphasize that B -values 
are not involved. 

Endofunctions on a given set form a monoid with respect to 
composition, which allows us a generic foldr-style operation. 

compMon : V{W} -4 Monoid ( X -4 X) 
compMon = record 

{neutral = id; combine = A f g — t fog } 
foldr : V{P A B} -4 

{A -4 B -4 B) -4 B -4 pjj F A -4 B 
foldr / b t = crush compMon f t b 

We can use foldr to build up B s from any structure containing As, 
given a way to ‘insert’ an A into a B, and an ‘empty’ B to start 
with. Let us check that our generic machinery is fit for purpose. 

10. The Simple Orderable Subuniverse of JJ 

The quicksort algorithm divides a sorting problem in two by parti¬ 
tioning about a selected pivot element the remaining data. Rendered 
as the process of building then flattening a binary search tree [4], 
the pivot element clearly marks the upper bound of the lower sub¬ 
tree and the lower bound of the upper subtree, giving exactly the 
information required to guide insertion. 

We can require the presence of pivots between substructures by 
combining the parameter ‘P and pairing ‘ x constructs of the PolyP 
universe into a single pivoting construct, ‘A, with two substructures 
and a pivot in between. We thus acquire the simple orderable 


universe, SO, a subset of J J picked out as the image of a function, 
Uso- Now, ‘P stands also for pivot! 

data SO : Set where 
‘R T : SO 

J+_ J A- : SO -4 SO -4 SO 
infixr 5 _‘A. 

Uso : SO ^ JJ 

[‘Rise = ‘R 

i i i]so = ‘i 

IS 1 + Tjso = [5]so‘+lT]so 

[S‘AT] S0 = [5]so‘x ‘P‘x [T] S0 

pso : SO — > Set — > Set 

PSO F P = pjj [F]so P 

Let us give SO codes for structures we often order and bound: 
‘List ‘Tree ‘Interval : SO 
‘List = ‘1 ‘+ (T ‘A ‘R) 

‘Tree = T ‘+ (‘R ‘A ‘R) 

‘Interval = T ‘A ‘1 

Every data structure described by SO is a regulated variety of 
node-labelled binary Pees. Let us check that we can turn anything 
into a tree, preserving the substructure relationship. The method 1 
is to introduce a helper function, go, whose type separates G, 
the structure of the top node, from F the structure of recursive 
subnodes, allowing us to take the top node apart: we kick off with 
G = F. 

tree : V{P F} -+■ p S o F P ->■ pso ‘Tree P 
tr ee{P}{F}(f) = go F f where 
go : VG ^ [[ G] so ]jj (pso F P) P — > pso ‘Tree P 
go ‘R / = tree/ 

go ‘1 0 - {<(» 

go ( S ‘+ T) (« sf = go S s 

go (S l + T) (> t) = go T t 

go {S ‘A T) ( s,p,t ) = (> (go S s,p , go T t)) 

All tree does is strip out the <s and >s corresponding to the 
structural choices offered by the input type and instead label the 
void leaves < and the pivoted nodes >. Note well that a singleton 
tree has void leaves as its left and right substructures, and hence 
that the inorder traversal is a strict alternation of leaves and pivots, 
beginning with the leaf at the end of the left spine and ending with 
the leaf at the end of the right spine. As our tree function preserves 
the leaf/pivot structure of its input, we learn that every datatype we 
can define in SO stores such an alternation of leaves and pivots. 



We are now in a position to roll out the “loose bounds” method 
to the whole of the SO universe. We need to ensure that each pivot 


1 If you try constructing the division operator as a primitive recursive func¬ 
tion, this method will teach itself to you. 
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is in order with its neighbours and with the outer bounds, and the 
alternating leaf/pivot structure gives us just what we need: let us 
store the ordering evidence at the leaves! 

Hso : SO -4 V{P} -4 RelPl —^ Rel P —^ Rel pj 
I‘R]| 0 RL = R 

:-l[so R L = r l? 

[S l + T\f Q RL » lS]f 0 RL+lT}f 0 RL 
[5 ‘A T R L « [S| 0 ULA[rg 0 fli 

data p | 0 (P : SO) {P : Set} (L : Rel P) 

( lu : P]_ X Pi) : Set where 
0 : IFifa (MS, F L)Llu ^ \4 0 F Liu 


We have shifted from sets to relations, in that our types are indexed 
by lower and upper bounds. The leaves demand evidence that the 
bounds are in order, whilst the nodes require the pivot first, then use 
it to bound the substructures appropriately. 

Meanwhile, the need in nodes to bound the left substructure’s 
type with the pivot value disrupts the left-to-right spatial ordering 
of the data, but we can apply a little cosmetic treatment, thanks to 
the availability of pattern synonyms. 

With these two devices available, let us check that we can still 
turn any ordered data into an ordered tree, writing L A (l, u) for 
p so ‘Tree L (l, u), and redefining intervals accordingly. 


A _* : V{P} -4 Rel P -4 Rel Pi 

L A = M-so ‘ Tree L 
pattern leaf = {<!) 

pattern node Ip p pu = (> ( lp,p L pu )) 
L‘ = p so ‘Interval L 
pattern _° p = ( ( p ,!,!)) 


tree : V{P P} {L : Rel P}->[p| 0 P£^ 
tree {P} {P} {L} {/) = go P / where 
go : VG -4 [[ G ]fb (p# 0 Pi)i4 L a ] 
go‘R / = tree/ 

go ‘1 ! = leaf 

go (S ‘+ T) (< s) = go S s 

go (S *+ T) (> t) = go T t 

go (S ‘A T) (s t p,t) = node (go S s) p (g< 


We have acquired a collection of orderable datatypes which 
all amount to specific patterns of node-labelled binary trees: an 
interval is a singleton node; a list is a right spine. All share the 
treelike structure which ensures that pivots alternate with leaves 
bearing the evidence the pivots are correctly placed with respect to 
their immediate neighbours. 

Let us check that we are where we were, so to speak. Hence we 
can rebuild our binary search tree insertion for an element in the 
corresponding interval: 

insert : [If -A L A -4 i A ] 

insert y° leaf = node leaf y leaf 

insert y° (node It p rt) with owoto y p 

... | le = node (insert y° It) p rt 

... | ge = node It p (insert y° rt) 


The constraints on the inserted element are readily expressed via 
our ‘Interval type, but at no point need we ever name the ordering 
evidence involved. The owoto test brings just enough new evidence 
into scope that all proof obligations on the right-hand side can be 
discharged by search of assumptions. We can now make a search 
tree from any input container. 

makeTree : V{P} -> ftjjPP -> i A (_L,T) 
makeTree = foldr(Ap —t insert p°) leaf 


11. Digression: Merging Monoidally 

Let us name our family of ordered lists L + , as the leaves form a 
nonempty chain of r lP ordering evidence. 

+ : V{P} -4 Rel P -4 Rel Pi 
L+ = Mso ‘ List L 

pattern [] - {<!> 

pattern x xs = (> (x, !,a;s) ) 

infixr 6 

The next section addresses the issue of how to flatten ordered 
structures to ordered lists, but let us first consider how to merge 
them. Merging sorts differ from flattening sorts in that order is 
introduced when ‘conquering’ rather than ‘dividing’. 

We can be sure that whenever two ordered lists share lower 
and upper bounds, they can be merged within the same bounds. 
Again, let us assume a type P of pivots, with owoto witnessing 
the totality of order L. The familiar definition of merge typechecks 
but falls just outside the class of lexicographic recursions accepted 
by Agda’s termination checker. I have locally expanded pattern 
synonyms to dig out the concealed evidence which causes the 
trouble. 

merge : [L + -A L + -A L + ] 

merge 0 ys = ys 

merge xs _ Q = xs 

merge (>| (y V s ) w ' th owoto x y 

... | le = xv. merge xs (y//ys) 

... | ge = y :: merge (> (|i x<xs)) ys 

In one step case, the first list gets smaller, but in the other, where 
we decrease the second list, the first does not remain the same: it 
contains fresh evidence that x is above the tighter lower bound, y. 
Separating the recursion on the second list is sufficient to show that 
both recursions are structural, 
merge : [L + —^L + ^L + ] 
merge [] = id 

merge {l,u} (x :: xs) = go where 
go : V{ Z } {{_ : Ll (l,#x)}} (L+AI+) (l,u) 

go [] = x :: xs 

go (y :: ys) with owoto x y 
... | le = xv. merge xs (y :: ys) 

... | ge = y :: go ys 

The helper function go inserts x at its rightful place in the second 
list, then resumes merging with xs. 

Merging equips ordered lists with monoidal structure. 
olMon : V{Zu} Lj_ lu =* Monoid {L + lu) 
olMon = record {neutral = [];combine = merge} 

An immediate consequence is that we gain a family of sorting 
algorithms which amount to depth-first merging of a given interme¬ 
diate data structure, making a singleton from each pivot, 
merge jj : V{P} ->■ pjj P P -»• L+ (i.,T) 
mergejj = crush olMon A p —> p :: [] 

The instance of mergejj for lists is exactly insertion sort: at each 
cons, the singleton list of the head is merged with the sorted tail. To 
obtain an efficient mergeSort, we should arrange the inputs as a 
leaf-labelled binary tree. 

‘qLTree : JJ 

‘qLTree = (‘1 ‘+ ‘P) ‘+ ‘R ‘x ‘R 
As ever, pattern synonyms prove invaluable for restoring readabil¬ 
ity. 
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pattern none = (< (< ())) 
pattern one p = (<(t>p)) 
pattern fork Z r = (> (Z, r)) 

We can add each successive elements to the tree with a twisting 
insertion, placing the new element at the bottom of the left spine, 
but swapping the subtrees at each layer along the way to ensure fair 
distribution. 

twistln : P — y pjj ‘qLTree P -4 pjj ‘qLTree P 
twistln p none = one p 
twistln p (one q) = fork (one p) (one q) 
twistln p (fork l r ) = fork (twistln p r) l 
If we notice that twistln maps elements to endofunctions on 
trees, we can build up trees by a monoidal crush, obtaining an 
efficient generic sort for any container in the J J universe. 
mergeSort : V{F} -4 pjj F P 4- L + (-L,T) 
mergeSort = mergejj o foldr twistln none 

12. Flattening With Concatenation 

Several sorting algorithms amount to building an ordered interme¬ 
diate structure, then flattening it to an ordered list. As all of our 
orderable structures amount to trees, it suffices to flatten trees to 
lists. Let us take the usual naive approach as our starting point. In 
Haskell, we might write 
flatten Leaf = [] 

flatten (Node 1 p r) = flatten 1 ++ p : flatten r 

so let us try to do the same in Agda with ordered lists. We shall 
need concatenation, so let us try to join lists with a shared bound p 
in the middle, 
infixr 8 .44- 

--H-- : V{P} {L : Rel P} {l p u} -4 
L+ ( l,p ) 4- L+ ( p,u ) 4- L+ ( l,u ) 

[] M-ys = |H 

(x :: xs) -H - ys = x :: xs -H- ys 

The ‘cons’ case goes without a hitch, but there is trouble at ‘nil’. 
We have ys : u^ 0 ‘List L (p. u) and we know L J (Z,p), but we 
need to return a ‘List L (Z, u). 


“The trouble is easy to fix,” one might confidently assert, whilst 
secretly thinking, “What a nuisance!”. We can readily write a helper 
function which unpacks ys, and whether it is nil or cons, extends its 
leftmost order evidence by transitivity. And this really is a nuisance, 
because, thus far, we have not required transitivity to keep our 
code well typed: all order evidence has stood between neighbouring 
elements. Here, we have two pieces of ordering evidence which we 
must join, because we have nothing to put in between them. Then, 
the penny drops. Looking back at the code for flatten, observe that 
p is the pivot and the whole plan is to put it between the lists. You 
can’t always get what you want, but you can get what you need, 
sandwich : V{P} {L : Rel P} 4 [(L + A L + ) -4 L + ] 
sandwich ([] <.p<ys) = p r. ys 

sandwich ( x :: xs,p,ys) = x :: sandwich (xs,p t ys) 

We are now ready to flatten trees, thence any ordered structure: 
flatten : V{P} {L : Rel P} -4 [L A 4i+] 
flatten leaf = [] 

flatten (node l p r) = sandwich (flatten fp, flatten r) 


flatten^ : V{P} {L : Rel P} {F} -4 [nf 0 FL^L+] 
flatten^, = flatten o tree 

For a little extra speed we might fuse that composition, but 
it seems frivolous to do so as the benefit is outweighed by the 
quadratic penalty of left-nested concatenation. The standard rem¬ 
edy applies: we can introduce an accumulator [18], but our experi¬ 
ence with +F should alert us to the possibility that it may require 
some thought. 


13. Faster Flattening, Generically 

We may define flatten generically, and introduce an accumulator 
yielding a combined flatten-and-append which works right-to-left, 
growing the result with successive conses. But what should be the 
bounds of the accumulator? If we have not learned our lesson, we 
might be tempted by 


flapp : V{P P} {L : Rel P} {l p u} -4 

4 , F L ( l,p ) -4 L+ ( p,u ) 4- L+ ( l,u ) 

but again we face the question of what to do when we reach a 
leaf. We should not need transitivity to rearrange a tree of ordered 
neighbours into a sequence. We can adopt the previous remedy 
of inserting the element p in the middle, but we shall then need 
to think about where p will come from in the first instance, for 
example when flattening an empty structure. 


flapp : V{F P} {L : Rel P} G 4 
[[G]f 0 (4 FL)LAL+ 4L+] 


flapp {P} ‘R ((f) 
flapp ‘1 (! 

flapp ( S ‘+ T) (« s 
flapp ( S ‘4- T) (t> t 
flapp ( S ‘A T) ((a, 


:P.ys) = flapp F (t^p, ys) 
ys) = p :: ys 
.P, ys) = flapp S (s.p.ys) 
<p< ys) = flapp T {t,p,ys) 
t).p<ys) 


= flapp S (s.p'.flapp T (t<p c ys)) 


To finish the job, we need to work our way down the right spine of 
the input in search of its rightmost element, which initialises p. 

flatten : V{P P } {L : Rel P} -4 [pf 0 F L -4 
flatten {F} {P} {L} {l,u} (t) = go F t where 

go : V{Z} G -4 [G] fo 04 FL) L(l,u) -4 L+ (l,u) 
go ‘R t = flatten t 

g°‘l ! = D 

go (S *+ T) (<s) = go S s 

go (S ‘+ T) (> t) = go T t 

go (S ‘A T) (s,p<t) = flapp S (s.p.go T t ) 


This is effective, but it is more complicated than I should like. 
It is basically the same function twice, in two different modes, de¬ 
pending on what is to be affixed after the rightmost order evidence 
in the structure being flattened: either a pivot-and-tail in the case 
of flapp, or nothing in the case of flatten. The problem is one of 
parity: the thing we must affix to one odd-length leaf-node-leaf 
alternation to get another is an even-length node-leaf alternation. 
Correspondingly, it is hard to express the type of the accumulator 
cleanly. Once again, I begin to suspect that this is a difficult thing 
to do because it is the wrong thing to do. How can we reframe the 
problem, so that we work only with odd-length leaf-delimited data? 


14. A Replacement for Concatenation 

My mathematical mentor, Tom Komer, is fond of remarking “A 
mathematician is someone who knows that 0 is 0 4 0”. It is often 
difficult to recognize the structure you need when the problem in 
front of you is a degenerate case of it. If we think again about 
concatenation, we might realise that it does not amount to affixing 
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one list to another, but rather replacing the ‘nil’ of the first list with 
the whole of the second. We might then notice that the monoidal 
structure of lists is in fact degenerate monadic structure. 

Any syntax has a monadic structure, where ‘return’ embeds 
variables as terms and ‘bind’ is substitution. Quite apart from their 
‘prioritised choice’ monadic structure, lists are the terms of a de¬ 
generate syntax with one variable (called ‘nil’) and only unary op¬ 
erators (‘cons’ with a choice of element). Correspondingly, they 
have this substitution structure: substituting nil gives concatena¬ 
tion, and the monad laws are the monoid laws. 

Given this clue, let us consider concatenation and flattening in 
terms of replacing the rightmost leaf by a list, rather than affixing 
more data to it. We replace the list to append with a function which 
maps the contents of the rightmost leaf—some order evidence— 
to its replacement. The type looks more like that of ‘bind’ than 
‘append’, because in some sense it is! 
infixr 8 _-H-_ 

RepL : V{P} -4 Rel P -4 Rel P]_ 

RepL L(n,u) = V{m} -4 Lj ( m,n ) «#• L+ ( m,u ) 

_+K : V{P} {L : Rel P} {l n u} -4 
L+ (l, n) -4 RepL L (n, u) -4 L+ (l, u) 
o -H- ys = ys 

( x :: xs) -H- ys = x :: xs 44 ys 

Careful use of instance arguments leaves all the manipulation 
of evidence to the machine. In the [] case, ys is silently instantiated 
with exactly the evidence exposed in the [] pattern on the left. 

Let us now deploy the same technique for flatten. 


flapp : V{P} {L : Rel P} {F} {l n u} -4 
pi 0 FL(l,n) -> RepL L (n, u) -4 L+ (/,«) 
flapp {P} { L } {F} {u = u} t ys = go ‘R t ys where 
go : V{Z n}G -4 \G jf 0 (pf 0 F L) L ( l,n ) -4 
RepL L (n,u) -4 L+ ( l,u ) 
go ‘R ( t ) ys = go F t ys 

go ‘1 ! ys = ys 

go ( S ! + T) (< s) ys = go S s ys 

go (S ! + T) (> t) ys = go T t ys 

go (S ‘A T) (s,p,t) ys m go S s (p :: go T t ys) 


flatten : V{P} {L : Rel P} {F} -4 [ pf 0 F i 4*. £+ ] 
flatten t — flapp t [] 


15. An Indexed Universe of Orderable Data 

Ordering is not the only invariant we might want to enforce on or¬ 
derable data structures. We might have other properties in mind, 
such as size, or balancing invariants. It is straightforward to extend 
our simple universe to allow general indexing as well as orderabil- 
ity. We can extend our simple orderable universe SO to an indexed 
orderable universe 10, just by marking each recursive position with 
an index, then computing the code for each node as a function of 
its index. We may add a ‘0 code to rule out some cases as illegal, 
data 10 (I : Set) : Set where 
‘R : I -4 10/ 

‘0‘1 : 10 7 

_‘+_ _‘A. : 10 I -4 10 I -4 10 I 

When interpreting such a code, we now require the family of 
relations which sit in recursive positions, one for each element 
of the index set, 7. However, the interpretation function is not 
concerned with indexing the overall node. The function mapping 
each index to the code for the appropriate node structure appears 
only when we tie the recursive knot. 


0 fo ■ V{7 P} -4 10 7 -4 

(7 -4 Rel Pi) -4 Rel P -4 Rel Pj 

[‘Rill! R L = R i 

[‘0]| iii = A.40 

[‘1]| R L = r JP 

[S‘+ T^RL = [Sj^RL+lT^RL 
[5‘ATg Ri m [SgfliA[7glii 
data p,f {7 P : Set} (P : 7 -4 10 7} (A : Rel P) 

( i : 7) (lu : Pj X Pi) : Set where 

0 : [P i ],! (p f 0 F L) Liu -4 p f 0 F Li lu 
We recover all our existing data structures by trivial indexing. 
‘List ‘Tree ‘Interval : 1 —4 10 1 
‘List _ = ‘1 ‘+ (‘1 ‘A ‘R {)) 

‘Tree _ = ‘1 ‘+ (‘R () ‘A ‘R ()) 

‘Interval _ = ‘1 ‘A ‘1 

We also lift our existing type-forming abbreviations: 

+ _ A * : V{P} -4 Rel P -4 Rel Pj 

L+ = p,| ‘List L () 

L * = d| ‘Tree L () 

L m — p^, ‘Interval L () 

However, we may also make profitable use of indexing: here are 
ordered vectors. 

‘Vec : N -4 ION 
‘VecO = ‘1 

‘Vec (s n) = ‘1 ‘A ‘R n 

Note that we need no choice of constructor or storage of length 
information: the index determines the shape. If we want, say, even- 
length tuples, we can use ‘0 to rule out the odd cases. 

‘Even : N -4 10 N 
‘Even 0 = ‘1 

‘Even(sO) = ‘0 

‘Even (s (s n)) = ‘1 ‘A ‘1 ‘A ‘R n 
We could achieve a still more flexible notion of data structure 
by allowing a general L-type rather than our binary ‘+, but we 
have what we need for Unitary data structures with computable 
conditions on indices. 

The tree operation carries over unproblematically, with more 
indexed input but plain output. 

tree : V{7 P F} {L : Rel P} {* : 7} -4 

Similarly, flatten works (efficiently) just as before, 
flatten : V{7PP}{7 : RelP}{i : 7} -4 
[pgFLi^L*] 

We now have a universe of indexed orderable data structures 
with efficient flattening. Let us put it to work. 

16. Balanced 2-3 Trees 

To ensure a logarithmic access time for search trees, we can keep 
them balanced. Maintaining balance as close to perfect as possible 
is rather fiddly, but we can gain enough balance by allowing a little 
redundancy. A standard way to achieve this is to insist on uniform 
height, but allow internal nodes to have either one pivot and two 
subtrees, or two pivots and three subtrees. We may readily encode 
these 2-3 trees and give pattern synonyms for the three kinds of 
structure. This approach is much like that of red-black (effectively, 
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2-3-4) trees, for which typesafe balancing has a tradition going 
back to Hongwei Xi and Stefan Kahrs [9, 19]. 

As with ‘Vec, case analysis on the index, now representing 
height, tells us whether we are at a leaf or an internal node. 


‘Tree23 : N -4 ION 
‘Tree23 0 = ‘1 

‘Tree23 (s h) = ‘R h ‘A (‘R h ‘+ (‘R h ‘A ‘R h)) 


_ 23 : V{P} (L : Rel P) 
IP = p|, ‘Tree23 L 

pattern noo 
pattern no 2 It p rt 
pattern no 3 It p mt q rt 


— N —*■ Rel Pi 
= <!> 

= (p,lt,< rt) 

= (p,lt,\> (q,mt,rt)) 


When we map a 2-3 tree of height n back to binary trees, we 
get a tree whose left spine has length n and whose right spine has 
a length between n and 2 n. 

Insertion is quite similar to binary search tree insertion, except 
that it can have the impact of increasing height. The worst that can 
happen is that the resulting tree is too tall but has just one pivot at 
the root. Indeed, we need this extra wiggle room immediately for 
the base case! 


ins23 : V7i {lu} —)• L* lu -f L 23 h lu — > 

L 3 h lu -|- 

I P A p -y I? 3 h (7ti lu, #p) x L 23 h ( #p , 7t 2 lu) 
ins23 0 y° no 0 « > Cf l), (!>) 


In the step case, we must find our way to the appropriate subtree by 
suitable use of comparison. 

ins23 (s h) y° (It,p,rest) 

with owoto y p 

ins23 (s h) y° (It,p, rest) 

I le = §, 

ins23 (s h) y° (no 2 It p rt) 

1 ge = i?t 

ins23 (s h) y° (no3 It p mt q rt) 

| ge with owoto y q 

ins23 (s h) y° (no3 It p mt q rt) 

ge 1 le = |? 2 

ins23 (s h) y° (no3 It p mt q rt) 

| ge | ge = |? s 


Our ?s| covers the case where the new element belongs in the left 
subtree of either a 2- or 3-node; m handles the right subtree of a 
2-node; | ?a and ?s handle middle and right subtrees of a 3-node 
after a further comparison. Note that we inspect rest only after we 
have checked the result of the first comparison, making real use of 
the way the with construct brings more data to the case analysis 
but keeps the existing patterns open to further refinement, a need 
foreseen by the construct’s designers [13]. 

Once we have identified the appropriate subtree, we can make 
the recursive call. If we are lucky, the result will plug straight back 
into the same hole. Here is the case for the left subtree. 


ins23 (s h) y° (It,p,rest) | le 
with ins23 h y° It 
ins23 (s h) y° (It,p, rest) | le 
| <i It' = < (It',p,rest) 

However, if we are unlucky, the result of the recursive call is too 
big. If the top node was a 2-node, we can accommodate the extra 
data by returning a 3-node. Otherwise, we must rebalance and pass 
the ‘too big’ problem upward. Again, we gain from delaying the 
inspection of rest until we are sure reconfiguration will be needed. 
ins23 (s h) y° (no 2 It p rt) | le 
| > (llt,r,lrt) = < (no 3 lit r Irt p rt) 
ins23 (s h) y° (no3 It p mt q rt) | le 

| > (llt,r,lrt) = t> (no 2 lit r lrt,p, no 2 mt q rt) 


For the f| problems, the top 2-node can always accept the result 
of the recursive call somehow, and the choice offered by the return 
type conveniently matches the node-ar ity ch oice, right of the pivot. 
For completeness, I give the middle ( ?g |) and right ( ?s |) cases 
for 3-nodes, but it works just as on the left. 


ins23 (s h) y° 

(n03 It p 

mt q rt) 

ge 

1 le 

with ins23 

h y° mt 




ins23 (s h) y° 

(no3 It p 

mt q rt) \ 

ge 

1 le 

| <1 mt' 

= 

< (no3 It i 

? mt 1 

q rt) 

ins23 (s h) y° 

(n03 It p 

mt q rt) j 

ge 

le 

| > ( mlt, r 

,mrt) = 

> (no 2 It j 

■> mlt. 

r.no 2 mrt q rt) 

ins23 (s h) y° 

(n03 It p 

mt q rt) \ 

ge 

1 ge 

with ins23 

hy° rt 




ins23 (s h) y° 

(no3 It p 

mt q rt) \ 

ge 

1 ge 

| <rt' 

= 

< (no3 It i 

? mt ( 

1 rt') 

ins23 (s h) y° 

(n03 It p 

mt q rt) \ 

ge 

1 ge 

| >(rlt,r. 

rrt) = 

> (no 2 It i 

i mt,. 

q, no 2 rtf r rrt) 


Pleasingly, the task of constructing suitable return values in 
each of these cases is facilitated by Agda’s type directed search 
gadget, Agsy [10]. There are but two valid outputs constructible 
from the pieces available: the original tree reconstituted, and the 
correct output. 

To complete the efficient sorting algorithm based on 2-3 trees, 
we can use a I-type to hide the height data, giving us a type which 
admits iterative construction. 

Tree23 = INA/i ->• L 23 h (. L,T) 
insert : P —> Tree23 —»■ Tree23 
insert p (h,t) with ins23 hp° t 
-■ | < t' = h ,t' 

... j > (lt,r,rt) = s h, no2 It r rt 
sort : V{F} —i \ijj F P -¥ L + (X,T) 
sort = flatten o 7t 2 o foldr insert (0, noo) 


17. Deletion from 2-3 Trees 

Might is right: the omission of deletion from treatments of balanced 
search trees is always a little unfortunate [15]. Deletion is a signif¬ 
icant additional challenge because we can lose a key from the mid¬ 
dle of the tree, not just from the fringe of nodes whose children are 
leaves. Insertion acts always to extend the fringe, so the problem 
is only to bubble an anomaly up from the fringe to the root. Fortu¬ 
nately, just as nodes and leaves alternate in the traversal of a tree, 
so do middle nodes and fringe nodes: whenever we need to delete a 
middle node, it always has a neighbour at the fringe which we can 
move into the gap, leaving us once more with the task of bubbling 
a problem up from the fringe. 

Our situation is further complicated by the need to restore the 
neighbourhood ordering invariant when one key is removed. At 
last, we shall need our ordering to be transitive. We shall also need 
a decidable equality on keys. 

data _s_ {X : Set} (x : X) : X -> Set where 

. o 6 *_= * 


trans : V{a:} y {z} -> L (x,y) =>■ L (y,z) => r L (x,zp 

eg? : (x y : P) -4 x = y + (x = y -> 0) 
Correspondingly, a small amount of theorem proving is indi¬ 
cated, ironically, to show that it is sound to throw information about 
local ordering away. 
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Transitivity for bounds. Transitivity we may readily lift to 
bounds with a key in the middle: 
pattern via p = p,!, ! 

transl : [(TT 1 A TP) -*■ HP] 
transX {_ ,T} _ = ! 

trans { 1 ,X} _ = ! 

transX {-L,#w} _ = ! 

transl {T,_} (via _) = magic 
transX {#I,#w} (via p) = trans p..\ 
transX {#Z, _L} (via _) = magic 

What is the type of deletion? When we remove an element from 
a 2-3 tree of height n, the tree will often stay the same height, but 
there will be situations in which it must get shorter, becoming a 
3-node or a leaf, as appropriate. 

Del 23 Short 23 : N — >• Rel PX 

Del 23 h lu — Short 23 h lu + A 23 h lu 

Short 23 0 lu m 0 

Short 23 (s h) lu = A 23 h lu 

The task of deletion has three phases: finding the key to delete; 
moving the problem to the fringe; plugging a short tree into a tall 
hole. The first of these will be done by our main function, 

del 23 : V{A} -> [A* -4 A 23 h -4 Del 23 h] 
and the second by extracting the extreme right key from a nonempty 
left subtree, 

extr : V{A} -* [A 23 (s h) -X (Del 23 (s h) A AX)] 
recovering the (possily short) remainder of the tree and the evi¬ 
dence that the key is below the upper bound (which will be the 
deleted key). Both of these operations will need to reconstruct trees 
with one short subtree, so let us build ‘smart constructors’ for just 
that purpose, then return to the main problem. 

Rebalancing reconstructors. If we try to reconstruct a 2-node 
with a possibly-short subtree, we might be lucky enough to deliver 
a 2-node, or we might come up short. We certainly will not deliver 
a 3-node of full height and it helps to reflect that in the type. 
Shortness can be balanced out if we are adjacent to a 3-node, but if 
we have only a 2-node, we must give a short answer. 

Re2 : N -> Rel P\ 

Re2 h = Short 23 (s h) + (A 23 h A A 23 h) 
d2t : V{/i} —► [(Del 23 h A A 23 h) -A Re2 h] 
d2t{A} ( > Ip.p.pu ) = > {Ip.p.pu) 

d2t {0} (< Q.p.pu) 

d2t {s A} (< Ip.p. no2 pq q qu) = < (n03 Ip p pq q qu) 
d2t {s h} (< Ip.p. no3 pq q qr r ru) 

= > (no 2 Ip p pq . q. no 2 qr r ru) 
t2d : V{A} —► [(A 23 h A Del 23 h) -A Re2 A] 
t2d {A} {lp.p.> pu) = > {Ip.p.pu) 

t2d {0} {lp.p.< ()) 

t2d {s A} (no 2 In n np.p.< pu) = <s (no3 In n np p pu) 

t2d {s A} (no3 Im m mn n np.p.< pu) 

= > (no 2 Im m mn. n, no 2 np p pu) 
rd : V{ A} -f [Re2 A X- Del 23 (s A)] 
rd (< s) as (< s) 

rd (> {Ip.p.pu)) = t> (no 2 Ip p pu) 

The adaptor rd allows us to throw away the knowledge that the 
full height reconstruction must be a 2-node if we do not need it, 


but the extra detail allows us to use 2-node reconstructors in the 
course of 3-node reconstruction. To reconstruct a 3-node with one 
possibly-short subtree, rebuild a 2-node containing the suspect, and 
then restore the extra subtree. We thus need to implement the latter. 
r3t : V{ A} -t [(Re2 A A A 23 A) X- Del 23 (s A)] 
r3t (> ( lm.m.mp).p.pu ) = > (no3 Im m mp p pu) 
r3t (< Ip.p.pu) = > (no 2 Ip p pu) 

t3r : V{A} -y [(A 23 A A Re2 A) Del 23 (s A)] 
t3r {lp.p.> {pq. q. qu)) = > (no 3 Ip p pq q qu) 
t3r {lp.p.<pu) = > (no2 Ip p pu) 

Cutting out the extreme right. We may now implement extr, 
grabbing the rightmost key from a tree. I use 
pattern Ir r = r, Ir , ! 

to keep the extracted element on the right and hide the ordering 
proofs. 

extr : V{A} -t [A 23 (s A) -4 (Del 23 (s A) A H^)] 
extr {0} (no 2 Ir r noo) = < Ir for 

extr {0} (no3 Ip p pr r noo) *= > (no 2 Ip p pr) for 
extr {s A} (no 2 Ip p pu) with extr pu 
... | pr for = rd (t2d {Ip.p.pr)) for 
extr {s A} (n03 Ip p pq q qu) with extr qu 
... | qr for = t3r (Zp.p, t2d {pq. q. qr)) for 
To delete the pivot key from between two trees, we extract the 
rightmost key from the left tree, then weaken the bound on the right 
tree (traversing its left spine only). Again, we are sure that if the 
height remains the same, we shall deliver a 2-node, 
delp : V{A} -4 [(A 23 A A A 23 A) X- Re2 A] 
delp {0} {lu} (noo^noo) = transX {lu} (via p) ,\ < noo 
delp {s A} {Ip.p.pu) with extr Ip 
... | Ir for = d2t {Ir, r. weak pu) where 
weak : V{A u} —> A 23 A (#p, u) — > I? 3 A (#r, u) 
weak {0} {«} noo = transX {#r,u} (via p) noo 
weak {s A} {pq. q. qu) = (weak pq. q. qu) 

A remark on weakenings. It may seem regrettable that we have 
to write weak, which is manifestly an obfuscated identity function, 
and programmers who do not wish the ordering guarantees are 
entitled not to pay and not to receive. If we took an extrinsic 
approach to managing these invariants, weak would still be present, 
but it would just be the proof of the proposition that you can lower 
a lower bound that you know for a tree. Consequently, the truly 
regrettable thing about weak is not that it is written but that it is 
executed. The ‘colored’ analysis of Bernardy and Moulin offers a 
suitable method to ensure that the weakening operation belongs 
to code which is erased at run time [3], An alternative might 
be a notion of ‘propositional subtyping’, allowing us to establish 
coercions between types which are guaranteed erasable at runtime 
because all they do is fix up indexing and the associated content- 
free proof objects. 

The completion of deletion. Now that we can remove a key, 
we need only find the key to remove. I have chosen to delete 
the topmost occurrence of the given key, and to return the tree 
unscathed if the key does not occur at all. 

As with insertion, the discipline of indexing by bounds and 
height is quite sufficient to ensure in silence that rebalancing works 
as required. Indeed, no further explicit proof effort is needed: once 
delp reestablishes the local ordering invariant around the deleted 
element, the rest of the ordering evidence stays intact from input to 
output. 
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del 23 

del 23 

del 23 

del 23 

del 23 

del 23 

del 23 

del 23 

del 23 

del 23 

del 23 


... | 

... | 
... | 


: V{ft} -¥ [L m -A L 23 h^r Del 23 h] 

{0} noo = > noo 

{s h} y° (lp t p<pu) with eql y p 

{s h} ,p° (no 2 Ip ppu) | < (> 
rd (delp (lp,p,pu)) 

{s h} .p° (no 3 Ipppqq qu) \ < () 
r3t (delp {lp,p,pq),q t qu) 

{s h} y° (lp t ptpu) | > _ with owoto 

{s h} y° (no 2 Ip p pu) | > _ | le 

rd (d2t (del 23 y° lp,p t pu)) 

{sft}2/° (no 2 Ip ppu) | >_ | ge 
rd (t2d {lp,p, del 23 y° pu)) 

{s h} y° (no 3 Ip p pq q qu) | > _ | le 

r3t (d2t (del 23 y° lp,p,pq),q,qu) 

{s h} y° (no 3 Ip p pq q qu) | > _ | ge with e 

{s h} .q° (no 3 Ip p pq q qu) j > _ | ge | < () 

t3r (lp L p , delp (pq<q<qu)) 
t> _ with owoto y q 

le = r3t (t2d (lp,p,de I 23 y° pq),q,qu) 
ge = t3r (lp,p, t2d (pg.g.del 23 y° qu)) 


At no point did we need to construct trees with the invariant 
broken. Rather, we chose types which expressed with precision 
the range of possible imbalances arising locally from a deletion. 
It is exactly this precision which allowed us to build and justify 
the rebalancing reconstruction operators we reused so effectively 
to avoid an explosion of cases. 


18. Discussion 

We have seen intrinsic dependently typed programming at work. 
Internalizing ordering and balancing invariants to our datatypes, 
we discovered not an explosion of proof obligations, but rather 
that unremarkable programs check at richer types because they 
accountably do the testing which justifies their choices. 

Of course, to make the programs fit neatly into the types, we 
must take care of how we craft the latter. I will not pretend for one 
moment that the good definition is the first to occur to me, and it is 
certainly the case that one is not automatically talented at designing 
dependent types, even when one is an experienced programmer 
in Haskell or ML. There is a new skill to learn. Hopefully, by 
taking the time to explore the design space for ordering invariants, 
I have exposed some transferable lessons. In particular, we must 
overcome our type inference training and learn to see types as 
pushing requirements inwards, as well as pulling guarantees out. 

It is positive progress that work is shifting from the program 
definitions to the type definitions, cashing out in our tools as con¬ 
siderable mechanical assistance in program construction. A precise 
type structures its space of possible programs so tightly that an in- 
eractive editor can often offer us a small choice of plausible al¬ 
ternatives, usually including the thing we want. It is exhilarating 
being drawn to one’s code by the strong currents of a good design. 
But that happens only in the last iteration: we are just as efficiently 
dashed against the rocks by a bad design, and the best tool to sup¬ 
port recovery remains, literally, the drawing board. We should give 
more thought to machine-assisted exploration. 

A real pleasure to me in doing this work was the realisation that 
I not only had ‘a good idea for ordered lists’ and ‘a good idea for 
ordered trees’, but that they were the same idea, and moreover that 
I could implement the idea in a datatype-generic manner. The key 
underpinning technology is first-class datatype description. By the 
end of the paper, we had just one main datatype p.^,, whose sole role 
was to ‘tie the knot’ in a recursive node structure determined by a 
computable code. The resulting raw data are strewn with artefacts 


of the encoding, but pattern synonyms do a remarkably good job 
of recovering the appearance of bespoke constructors whenever we 
work specifically to one encoded datatype. 

Indeed, there is clearly room for even more datatype-generic 
technology in the developments given here. On the one hand, 
the business of finding the substructure in which a key belongs, 
whether for insertion or deletion, is crying out for a generic con¬ 
struction of Gerard Huet’s ‘zippers’ [7], Moreover, the treatment of 
ordered structures as variations on the theme of the binary search 
tree demands consideration in the framework of ‘ornaments’, as 
studied by Pierre-Evariste Dagand and others [5]. Intuitively, it 
seems likely that the 10 universe corresponds closely to the orna¬ 
ments on node-labelled binary trees which add only finitely many 
bits (because 10 has ‘+ rather than a general I). Of course, one 
node of a p,Q type corresponds to a region of nodes in a tree: per¬ 
haps ornaments, too, should be extended to allow the unrolling of 
recursive structure. 

Having developed a story about ordering invariants to the ex¬ 
tent that our favourite sorting algorithms silently establish them, we 
still do not have total correctness intrinsically. What about permu¬ 
tation? It has always maddened me that the insertion and flattening 
operations manifestly construct their output by rearranging their in¬ 
put: the proof that sorting permutes should thus be by inspection. 
Experiments suggest that many sorting algorithms can be expressed 
in a domain specific language whose type system is linear for keys. 
We should be able to establish a general purpose permutation in¬ 
variant for this language, once and for all, by a logical relations 
argument. We are used to making sense of programs, but it is we 
who make the sense, not the programs. It is time we made programs 
make their own sense. 
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